Half-Context Language Models
نویسندگان
چکیده
This article investigates the effects of different degrees of contextual granularity on language model performance. It presents a new language model that combines clustering and halfcontextualization, a novel representation of contexts. Half-contextualization is based on the halfcontext hypothesis that states that the distributional characteristics of a word or bigram are best represented by treating its context distribution to the left and right separately and that only directionally relevant distributional information should be used. Clustering is achieved using a new clustering algorithm for class-based language models that compares favorably to the exchange algorithm. When interpolated with a Kneser-Ney model, half-context models are shown to have better perplexity than commonly used interpolated n-gram models and traditional class-based approaches. A novel, fine-grained, context-specific analysis highlights those contexts in which the model performs well and those which are better treated by existing non-class-based models.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملSocial Cognition of Gender and Language
The question, Do men and women use language differently played a central part in the emergence of feminist socio linguistics more than two decades ago, and it casts a long shadow. This paper focuses on the literature that has contributed to the understanding of the major research questions underlying two major strands, language and gender, concentrating on the development of the literature from...
متن کاملTone Recognition of Continuous Thai Speech Under Tonal Assimilation and Declination Effects Using Half-Tone Model
This paper presents a method for continuous Thai tone recognition. One of the main problems in tone recognition is that several interacting factors affect F0 realization of tones. In this paper, we focus on the tonal assimilation and declination effects. These effects are compensated by the tone information of neighboring syllables, the F0 downdrift and the context-dependent tone model. However...
متن کاملGrammars and Topic Models
Context-free grammars have been a cornerstone of theoretical computer science and computational linguistics since their inception over half a century ago. Topic models are a newer development in machine learning that play an important role in document analysis and information retrieval. It turns out there is a surprising connection between the two that suggests novel ways of extending both gram...
متن کاملLanguage and Identity in the Iranian Context: The Impact of Identity Aspects on EFL Learners' Achievement
Identity orientations refer to the relative importance that individuals place on various identity attributes or characteristics such as race, religion, culture and language when constructing their self-definitions (Chew, 2007; Cheek, 1989). Accordingly, the present study aims at identifying the impact of identity aspects on the Iranian learners' English language achievements at Shiraz Universit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 37 شماره
صفحات -
تاریخ انتشار 2011